Fitted Modelling
Terminology
Regression: Look for relationships amongst variables. Using for determine how multiple variables
are related or predict a value.
Correlation coefficient: the covariance of the variables divided by the product of their standard
deviations
Residuals = the distances between the observed values and the predicted values
Ordinary least squares (OLS) = minimises the sum of squared residuals (SSR)
Polynomial Regression
Bayesian information criterion (BIC): includes a penalty for using more variables. Other similar
measures include the adjusted-R2
• Poor fit due to high bias called under-fitting
• Poor fit due to low bias called overfitting(过度拟合)
Split up the data we have into two non-overlapping parts, a training set and a test set
Bias: measures how much the prediction differs from the desired regression function.
Variance: measures how much the predictions for individual data sets vary around their average.